Motivation—Tim

This year, a report from Centers for Disease Control and Prevention revealed that America’s obesity rate has reached a record high. Contrasting to popular belief, New Yorkers are not so safe from the obesity epidemic, as more than half of adult New Yorkers are either overweight or obese. Studies show that the rise in the obesity epidemic is partly due to disparities in food environment; it is harder for some to eat healthier because their options are limited.

For example, I and my fellow classamate often go out to McDonald that are near our home for convenience, even though these restaurants were not healthy. Environment can affect our health behaviours imperceptibly.

This project intends to look deeper into the relationship between food environment in NYC and obesity rate along with diabetes rate and stroke hospitalization rate.

Initial questions—Tim

1. Can the percentage of nation-wide chain and fast-food restaurants in different neighborhood of NYC be related to their obesity, diabetes and stroke hospitalization rate ?

2.Can the percentage of fast-food restaurant (defined by cuisine description in the restaurant inspection dataset) in different neighborhood of NYC be related to their obesity, diabetes and stroke hospitalization rate ?

3.Can the percentage of health inspection grade A restaurant in different neighborhood of NYC be related to their obesity, diabetes and stroke hospitalization rate?

  1. Can the composite restaurant health score of different neighborhood of NYC be related to their obesity, diabetes and stroke hospitalization rate?

For the first three questions, we stick to it. Since the percentage of health inspection grade A restaurant is not a significant predictor, we later put it into the model as a confounder. Moreover, we deleted the fourth problem. Composite restaurant health score were to subjective and difficult to assess. Moreover, we calculated the boro level model first due to the difficulties we encountered in scaping zipcode-neighborhood data. However, after some struggle, we still managed to get the neighborhood information and analyze the data accordingly.

Data and Methods

Data Source and Collection—Tim

1. NYC restaurant inspection:

https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j

This dataset contains the data for restaurant inspection in NYC from August 1, 2014 to December 5th, 2017. Every row is a restaurant inspection record that includes the name of the restaurant, zipcode, cuisine type description, inspection grade (A as the best grade) and so on. Since this dataset has the inspection data for 3 years, it is almost certained that it has the data for all the restuarants in NYC in 2014.

Variable used in this datasets
DBA: Name of the restaurant
BORO: name of the boro
ZIPCODE: the zipcode of the restaurant
CUISINE DESCRIPTION: the kind of food that the restaurant is providing
GRADE: Inspection grade for the specific inspection.

2. 2015 Community Health Profiles Open Data:

https://www1.nyc.gov/site/doh/data/data-publications/profiles.page

This dataset contains NYC every neighborhoods’ demographic (percentage of white race, poverty percentage), health (age-adjusted percent of adult exercised in the last 30 days, age-adjusted percent of adults as a smoker) and our main outcome (Age-adjusted percent of adults that is obese (BMI of 30 or gthe reater), ge-adjusted percent of adults, Age-adjusted rate of hospitalizations due to stroke (cerebrovascular disease) per 100,000 adults)

Variable used in this datasets
Name: the name for the neigborhood.
Racewhite_Rate: the percentage for white race
Poverty: Percent of individuals living below the federal poverty threshold
Smoking: age-adjusted percent of adults as a smoker Exercise: Age-adjusted percent of adults that reported getting any exercise in the last 30 days
Obesity: Age-adjusted percent of adults that is obese (BMI of 30 or greater) based on self-reported height and weight
Diabetes: Age-adjusted percent of adults that had ever been told by a healthcare professional that they have diabetes
Stroke_Hosp: Age-adjusted rate of hospitalizations due to stroke (cerebrovascular disease) per 100,000 adults

3. Web scraping for what zipcodes each neighborhood contains
Because Community Health Profiles Open Data is has only neighborhood level information, so we have to aggregate neighborhood level information from the restaurant inspeciton data. However, the restaurant inspection data was using zipcodes for every restaurant. Therefore, we had to add the neighborhood name to every restaurant, so that we can group by neighborhood then calculate the percentage for every neighborhood.

First, we scraped the table data from https://www.health.ny.gov/statistics/cancer/registry/appendix/neighborhoods.htm. However the neighborhood name and combinations were different from the health profile data we downloaded. We cannot merge the two datasets. Then we tried to look for the raw classification for the neigborhoods in health data from New York University’s Furman Center for Real Estate and Urban Policy and the NYC Department of City Planning. However, still no luck becuase the neighborhood was not divided by zipcode areas. Then we tried the hardest way: search the name of the neighborhood on Google and finding the matching zipcodes. It worked out well, but it was not precise, because neighborhoods were not divided according to the area of the zipcode. Different neighborhoods have the same zipcodes.
In order to be unbiased in this process, if two neighborhoods contains the same zipcode area, we will randomly assign this zipcode to only one neighborhood.

Data Import and Cleaning—Tim

Downloading restaurant inspection data from NYC open data:

get_all_inspections = function(url) {
  
  all_inspections = vector("list", length = 0)
  
  loop_index = 1
  chunk_size = 50000
  DO_NEXT = TRUE
  
  while (DO_NEXT) {
    message("Getting data, page ", loop_index)
    
    all_inspections[[loop_index]] = 
      GET(url,
          query = list(`$order` = "zipcode",
                       `$limit` = chunk_size,
                       `$offset` = as.integer((loop_index - 1) * chunk_size)
                       )
          ) %>%
      content("text") %>%
      fromJSON() %>%
      as_tibble()
    
    DO_NEXT = dim(all_inspections[[loop_index]])[1] == chunk_size
    loop_index = loop_index + 1
  }
  
  all_inspections
  
}

url = "https://data.cityofnewyork.us/resource/9w7m-hzhe.json"

rest_inspection = get_all_inspections(url) %>%
  bind_rows() 

Downloading health and demographic data CSV into local folder and reading, cleaning it:

download.file("https://www1.nyc.gov/assets/doh/downloads/excel/episrv/2015_CHP_PUD.xlsx", mode="wb", destfile = "health.xlsx")
health <- read_excel("health.xlsx", sheet = "CHP_all_data") %>% 
  select(Name, Racewhite_Rate, Poverty, Unemployment,
         Smoking, Exercise,
         Obesity, Diabetes, Stroke_Hosp) %>% 
  clean_names() 

Matching the neighborhood names with restaurant zipcodes:

zip_neighbor <- read_csv("neigh_zipcode.csv") %>% 
  mutate(zipcode = as.character(zipcode))
##restaurant data with neighbourhood
rest_neighborhood = left_join(rest_inspection, zip_neighbor, by = "zipcode") %>% 
  filter(!is.na(neighborhood))

Exploratory Analysis

Health—Tim&Austin

health_only_neighborhood <- health[-c(1:6),] %>% 
  rename(neighborhood = name) %>% 
  mutate(neighborhood = as.factor(neighborhood))
##Plotting for outcome in different neighborhood


health_only_neighborhood %>%
  mutate(neighborhood = fct_reorder(neighborhood, obesity)) %>% 
  filter(obesity > 25) %>% 
  ggplot(aes(x = neighborhood,y = obesity, fill = neighborhood)) + geom_col() +  
  theme(axis.text.x = element_text(angle = 45, hjust = 1), 
        legend.position = "none") +
  labs(title = "Neighborhood with 25 percent more obesity rate")



health_only_neighborhood %>%
  mutate(neighborhood = fct_reorder(neighborhood, diabetes)) %>% 
  filter(diabetes > 10) %>% 
  ggplot(aes(x = neighborhood,y = diabetes, fill = neighborhood)) + geom_col() +   
  theme(axis.text.x = element_text(angle = 45, hjust = 1), 
        legend.position = "none") +
  labs(title = "Neighborhood with 10 percent more diabetes rate")



health_only_neighborhood %>%
  mutate(neighborhood = fct_reorder(neighborhood, stroke_hosp)) %>% 
  filter(stroke_hosp > 300) %>% 
  ggplot(aes(x = neighborhood,y = stroke_hosp, fill = neighborhood)) + 
  geom_col() +    
  theme(axis.text.x = element_text(angle = 45, hjust = 1), 
        legend.position = "none") +
  labs(title = "Neighborhood with 300 more stroke hospitalization in 100,000 adults")

Summary: Williamsbridge and Baychester have the highest obesity rate, up to 32%. East New York and Starrett City have the highest diabetes rate, up to 17 percent. BUshwick has the highest stroke hospitalization in 100,000 adults, up to 300 adults. Morrisania and Crotona has the highest percent of poverty rate, up to 40%. Fiancial district has the highest percent of young peoeple, up to 67%.

Restaurant data

Fastfood restaurants by cuisine type—Ruiwei

# Identify fastfood restaurants by their cuisine types. We print out the cuisine type list (n=85) and let everyone circle the ones they think is fastfood and we use the union as our rule (use union because it's more conservative).

rest_inspection %>%
  mutate(dba = str_to_upper(dba)) %>%
  arrange(cuisine_description) %>%
  count(cuisine_description)
# calculating the total number of restaurants and the number of fastfood restaurants in the neighborhood, as well as the percentage of fastfood restaurants.

neighborhood_list = 
  rest_neighborhood %>%
  distinct(neighborhood) %>%
  arrange(neighborhood)
  
rest_fastfood_neighborhood = 
  rest_neighborhood %>%
  filter(cuisine_description %in% c("Bagels/Pretzels",
                                    "Bottled beverages, including water, sodas, juices, etc.",
                                    "Chicken",
                                    "Delicatessen",
                                    "Donuts",
                                    "Hamburgers",
                                    "Hotdogs",
                                    "Hotdogs/Pretzels",
                                    "Ice Cream, Gelato, Yogurt, Ices",
                                    "Nuts/Confectionary",
                                    "Pancakes/Waffles",
                                    "Pizza",
                                    "Soul Food",
                                    "Sandwiches",
                                    "Sandwiches/Salads/Mixed Buffet",
                                    "Soups & Sandwiches"))

percent_fastfood_neighborhood = function(name_neighborhood){

  rest_each_neighborhood =
    rest_neighborhood %>%
    filter(neighborhood == name_neighborhood) %>%
    distinct(camis)
  
  n_rest_neighborhood = nrow(rest_each_neighborhood)

  rest_fastfood_distinct_neighborhood = 
    rest_fastfood_neighborhood %>%
    filter(neighborhood == name_neighborhood) %>%
    distinct(camis, cuisine_description)
  
  n_fastfood_neighborhood = nrow(rest_fastfood_distinct_neighborhood)
    
  percent_fastfood_neighborhood = n_fastfood_neighborhood/n_rest_neighborhood
  
  tibble(
    neighborhood = name_neighborhood,
    n_fastfood = n_fastfood_neighborhood,
    n_rest = n_rest_neighborhood,
    percent_fastfood = percent_fastfood_neighborhood
  )
}

fastfood_neighborhood = 
  map(neighborhood_list$neighborhood, percent_fastfood_neighborhood) %>%
  bind_rows() %>%
  mutate(neighborhood = str_to_upper(neighborhood))
fastfood_neighborhood %>%
  mutate(neighborhood = as.factor(neighborhood),
         n_rest = as.numeric(n_rest),
         n_nonfastfood = n_rest - n_fastfood,
         neighborhood = fct_reorder(neighborhood, percent_fastfood)) %>%
  plot_ly(., x = ~neighborhood, y = ~n_fastfood, type = 'bar', name = 'fastfood restaurants') %>%
  add_trace(y = ~n_nonfastfood, name = 'non-fastfood restaurants') %>%
  layout(yaxis = list(title = 'number of restaurants'), 
         xaxis = list(title = 'neighborhood (ordered by percentage of fastfood restaurants)',
                      tickangle = 45),
         barmode = 'stack')

From the plot, we can see that, while the Greenwich Village and SOHO neighborhood has fairly large number of restaurants, it has the smallest percentage of fastfood restaurants. Williamsbridge and Baychester has the largest percentage of fastfood restaurants.

When large number of total restaurants is not equal to large percentage of fastfood restaurants in that neighborhood, we can conclude that the distribution of fastfood restaurants is not even across neighborhoods, which also implies the motivation of our study, we want to investigate if this uneven distribution of fastfood restaurants is associated with diffferent level of chronic disease outcomes within a neighborhood.

Restaurant Chains—Sara

## Matching list of chain restaurants in U.S. and restaurant inspection data

chains_html = read_html("https://en.wikipedia.org/wiki/List_of_restaurant_chains_in_the_United_States#Fast-casual")

chain_rest = chains_html %>%
  html_nodes("ul:nth-child(9) li , .jquery-tablesorter tr:nth-child(1) td:nth-child(1)") %>%
  html_text() %>%
  as.tibble() %>%
  mutate(dba = value, 
         dba = str_to_upper(dba)) %>%
  select(dba)
# read in the list of chain restaurants in us 
# made the names to uppercase and changed the var name to dba
head(chain_rest, 10)
## # A tibble: 10 x 1
##                       dba
##                     <chr>
##  1        A&W RESTAURANTS
##  2             APPLEBEE'S
##  3             BAJA FRESH
##  4          BOSTON MARKET
##  5     BUFFALO WILD WINGS
##  6                CHILI'S
##  7 CHIPOTLE MEXICAN GRILL
##  8           CICI'S PIZZA
##  9    COLD STONE CREAMERY
## 10     CORNER BAKERY CAFE

I have scraped list of 75 national chain restaurants from the wikipedia page (https://en.wikipedia.org/wiki/List_of_restaurant_chains_in_the_United_States#Fast-casual) now I will join this dataset with restaurant inspection data to choose only the chain restaurants in NYC.

# removing punctuations in chain_rest & restaurant inspections (neighborhoods)
chain_rest_str = 
  chain_rest  %>%
  mutate(dba = str_replace_all(dba, "[[:punct:]]", ""))

rest_neigh_str = rest_neighborhood %>% 
  mutate(dba = str_replace_all(dba, "[[:punct:]]", ""))

# Matching the two datasets(restaurant inspection data that has all punctuation removed from dba(restaurant name) and list of chain restaurants by dba)

neighborhood_chain = 
  right_join(rest_neigh_str, chain_rest_str) %>%
  filter(!is.na(camis)) %>%
  distinct(camis, dba, neighborhood, boro)
## Joining, by = "dba"

neighborhood_chain %>%
  group_by(dba) %>%
  summarise(n = n()) %>%
  arrange(desc(n)) %>%
  head(10)
## # A tibble: 10 x 2
##                       dba     n
##                     <chr> <int>
##  1          DUNKIN DONUTS   453
##  2                 SUBWAY   364
##  3              STARBUCKS   307
##  4              MCDONALDS   216
##  5 CHIPOTLE MEXICAN GRILL    78
##  6                 WENDYS    46
##  7              APPLEBEES    28
##  8          BOSTON MARKET    24
##  9           WHITE CASTLE    23
## 10              PIZZA HUT    21

The combined dataframe, nyc_chain has nrow(nyc_chain) observations. Also, there were 33 different chain restaurants extracted. The chart is showing the 10 chain restaurants in descending order.

# counting chains in neighborhoods

neigh_count_chain = neighborhood_chain %>%
  group_by(neighborhood, boro) %>%
  summarise(chain_n = n())

neigh_count_rest = rest_neighborhood %>%
  distinct(neighborhood, camis) %>%
  group_by(neighborhood) %>%
  summarise(res_n = n())

# calculating percentage of chains in each boro
percent_neighborhood_chain = left_join(neigh_count_chain, neigh_count_rest) %>%
  ungroup() %>%
  mutate(chain_percentage = chain_n/res_n,
         neighborhood = str_to_upper(neighborhood))
## Joining, by = "neighborhood"

plot_chain_neighbor = percent_neighborhood_chain %>%
  mutate(neighborhood = forcats::fct_reorder(neighborhood, chain_percentage)) %>%
  ggplot(aes(neighborhood, chain_percentage, fill = boro)) + geom_bar(stat="identity") 

ggplotly(plot_chain_neighbor)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`

I have plotted neighborhoods of NYC against their percentage of chain restaurants and grouped them by borough. We see that all the neighborhodds that had a percentage of chain restuarants less than 5% were all part of Brooklyn except Greenwhich village and Soho of Manhattan. Neighborhoods in Queens and Manhattan are spread out across low to high percentage of chain restaurants while most of neighborhoods in Bronx and Staten Island have high percentages. The neighbor hood that had the highest percentage of chain restaurants were Throgs neck and Co-op City of Bronx with 16.5% of chain restaurats out of all restaurats.

Inspection Grade—Austin

gradea_neighborhood =
  rest_neighborhood %>%
  group_by(neighborhood, grade) %>%
  summarise(n = n()) %>%
  mutate(grade_percent = n / sum(n)) %>% 
  filter(grade == "A") %>% 
  ungroup(boro) %>%
  mutate(neighborhood = str_to_upper(neighborhood))
gradea_neighborhood %>% 
  mutate(neighborhood = as.factor(neighborhood),
         neighborhood = fct_reorder(neighborhood,grade_percent)) %>% 
  plot_ly(x = ~neighborhood, y = ~grade_percent, color = ~neighborhood, type = "bar")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors

Different from the situation in borough, the difference of percentage of “grade-A” restaurants in each borough exists. “Throgs Neck and Co-op City” has the greatest grade-A restaurants percentage, around 44.50%. “Sunset Park”, however, has the least, around 30.62%. “Washington Heights”, where we live, takes the fourth from the end, 34.39%, which is obviously consistent with the feeling we have towards the restaurant condition of “Washington Heights”.

Formal Analysis and Findings

Model Selection Process

health_neighborhood = 
  health %>%
  mutate(neighborhood = str_to_upper(name)) %>%
  select(-name)

combined_chain = 
  percent_neighborhood_chain %>%
  select(neighborhood, chain_percentage)
combined_chain_fastfood = 
  fastfood_neighborhood %>%
  mutate(fastfood_percent = percent_fastfood) %>%
  select(neighborhood, fastfood_percent) %>%
  right_join(combined_chain, by = "neighborhood")
combined_chain_fastfood_gradea = 
  gradea_neighborhood %>%
  select(neighborhood, grade_percent) %>%
  right_join(combined_chain_fastfood, by = "neighborhood")

combined_model =  
  left_join(combined_chain_fastfood_gradea, health_neighborhood, by = "neighborhood")
lm(obesity ~ fastfood_percent + grade_percent + racewhite_rate + poverty + smoking + exercise, combined_model) %>%
  broom::tidy() %>%
  kable()
term estimate std.error statistic p.value
(Intercept) 31.9211243 15.3984162 2.0730135 0.0431433
fastfood_percent 59.4287281 16.4442064 3.6139615 0.0006802
grade_percent -28.2875894 27.9231907 -1.0130500 0.3157269
racewhite_rate -0.0925686 0.0428847 -2.1585476 0.0355225
poverty 0.0514464 0.0956373 0.5379318 0.5929192
smoking 0.7632678 0.2462693 3.0993214 0.0031276
exercise -0.2324972 0.1439111 -1.6155610 0.1122412

lm(obesity ~ chain_percentage + grade_percent + racewhite_rate + poverty + smoking + exercise, combined_model) %>%
  broom::tidy() %>%
  kable()
term estimate std.error statistic p.value
(Intercept) 26.9238921 16.1267974 1.6695126 0.1010260
chain_percentage 68.2439998 26.3482159 2.5900805 0.0124195
grade_percent -6.1900297 27.7492092 -0.2230705 0.8243546
racewhite_rate -0.1308782 0.0418674 -3.1260190 0.0028985
poverty 0.1357973 0.1065868 1.2740539 0.2083092
smoking 0.8302436 0.2575427 3.2237124 0.0021876
exercise -0.2204302 0.1532812 -1.4380771 0.1564036

lm(obesity ~ fastfood_percent + chain_percentage + grade_percent + racewhite_rate + poverty + smoking + exercise, combined_model) %>%
  broom::tidy() %>%
  kable()
term estimate std.error statistic p.value
(Intercept) 31.4461097 15.5523111 2.0219573 0.0484418
fastfood_percent 52.8234591 22.0338610 2.3973764 0.0202136
chain_percentage 15.2539278 33.5366773 0.4548431 0.6511524
grade_percent -29.7598701 28.3241688 -1.0506882 0.2983551
racewhite_rate -0.0920098 0.0432330 -2.1282332 0.0381711
poverty 0.0710215 0.1055479 0.6728842 0.5040586
smoking 0.7603044 0.2482547 3.0625984 0.0034987
exercise -0.2223368 0.1467317 -1.5152612 0.1358801

We originally have two main predictors of interest, percentage of chain restaurants and percentage of fastfood restaurants, and they are both significantly associated with the three chronic disease health oucomes when solely in the model after adjusting for other potential confounders. Taking obesity as example, the p-value for fastfood_percent is 0.0006801903 and for chain_percent is 0.012419514.

We will need to make decision on which to keep as the final main predictor. So we put the two predictors in the same model and


lm(diabetes ~ fastfood_percent + grade_percent + racewhite_rate + poverty + smoking + exercise, combined_model) %>%
  broom::tidy() %>%
  kable()
term estimate std.error statistic p.value
(Intercept) 17.1764761 6.4782224 2.6514181 0.0105992
fastfood_percent 16.9191098 6.9181937 2.4455964 0.0178828
grade_percent 1.6974935 11.7474835 0.1444985 0.8856655
racewhite_rate -0.0787451 0.0180419 -4.3645736 0.0000607
poverty 0.0483703 0.0402353 1.2021846 0.2347381
smoking 0.1523610 0.1036073 1.4705630 0.1474349
exercise -0.1449503 0.0605444 -2.3941147 0.0203017
lm(diabetes ~ fastfood_percent + racewhite_rate + poverty + smoking + exercise, combined_model) %>%
  broom::tidy() %>%
  kable()
term estimate std.error statistic p.value
(Intercept) 17.6786699 5.4163283 3.263958 0.0019270
fastfood_percent 17.4843100 5.6533466 3.092736 0.0031611
racewhite_rate -0.0778464 0.0167787 -4.639603 0.0000233
poverty 0.0469398 0.0386366 1.214906 0.2297869
smoking 0.1529457 0.1025675 1.491172 0.1418447
exercise -0.1443873 0.0598582 -2.412154 0.0193531

Final Models and Findings

lm(obesity ~ fastfood_percent + grade_percent + racewhite_rate + poverty + smoking + exercise, combined_model) %>%
  broom::tidy() %>%
  kable()
term estimate std.error statistic p.value
(Intercept) 31.9211243 15.3984162 2.0730135 0.0431433
fastfood_percent 59.4287281 16.4442064 3.6139615 0.0006802
grade_percent -28.2875894 27.9231907 -1.0130500 0.3157269
racewhite_rate -0.0925686 0.0428847 -2.1585476 0.0355225
poverty 0.0514464 0.0956373 0.5379318 0.5929192
smoking 0.7632678 0.2462693 3.0993214 0.0031276
exercise -0.2324972 0.1439111 -1.6155610 0.1122412

lm(diabetes ~ fastfood_percent + racewhite_rate + poverty + smoking + exercise, combined_model) %>%
  broom::tidy() %>%
  kable()
term estimate std.error statistic p.value
(Intercept) 17.6786699 5.4163283 3.263958 0.0019270
fastfood_percent 17.4843100 5.6533466 3.092736 0.0031611
racewhite_rate -0.0778464 0.0167787 -4.639603 0.0000233
poverty 0.0469398 0.0386366 1.214906 0.2297869
smoking 0.1529457 0.1025675 1.491172 0.1418447
exercise -0.1443873 0.0598582 -2.412154 0.0193531

lm(stroke_hosp ~ fastfood_percent + grade_percent + racewhite_rate + poverty + smoking + exercise, combined_model) %>%
  broom::tidy() %>%
  kable()
term estimate std.error statistic p.value
(Intercept) 128.733029 159.5106855 0.8070496 0.4233141
fastfood_percent 573.540649 170.3439236 3.3669569 0.0014362
grade_percent -414.861424 289.2535987 -1.4342481 0.1574881
racewhite_rate -1.373925 0.4442382 -3.0927657 0.0031864
poverty 1.564361 0.9906978 1.5790491 0.1203893
smoking 6.603847 2.5510800 2.5886475 0.0124653
exercise 1.909576 1.4907614 1.2809397 0.2058984

We have three final models each having a health outcome (prevalence of obesity, prevalence of diabetes, and stroke hospitalization rate) as the dependent variable and percentage of fastfood as the main predictor. Obesity and percentage of fastfood restuarants model gave the most significant results with p-value of the main predictor being 0.0005364 (<0.01). In other words, at significance level of 1%, every 0.1 percent increase in fastfood restaurants in a neighborhood is associated with 5% increase in the neighborhood’s obesity prevalence, while adjusting for other factors.

The p-value of the regression coefficients for model with outcomes as diabetes/stroke were both arround 0.003 (<0.01), indicating there is a significant linear relationship between diabetese/stroke and percentage of fastfood restaurants at 1% significance level.

It is reasonable that among the three health outcomes, obesity has the strongest linear association with food environment, in this case, percentage of fastfood restaurants. It is a health condition that has the most direct relation with one’s diet. Moreover, it has a higher prevalence than diabetes and stroke, which could lead to having a lower p-value than the other two health conditions

Conclusion—to do later

write

Discussion—to do later

write